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Abstract. ML4PG is an extension to the Proof General interface, al- 
lowing the Proof General user to invoke machine-learning algorithms 
and cluster proofs and proof libraries. In this paper, we show three 
benchmarking examples of the proof-pattern recognition across differ- 
ent libraries, notations, users, data types and lemma shapes. In the first 
example, ML4PG is used to detect non-trivial patterns arising in proofs 
across SSReflect libraries for Linear Algebra, Combinatorics and Persis- 
tent Homology. In the second and more applied example, it is used to 
help in the formulation of auxiliary lemmas when adapting the Computer 
Algebra methodology of CoqEAL to new domains. Finally, the third, 
industry-related example shows ML4PG clustering proofs of properties 
of the Java Virtual Machine in the scenario of team proof-development. 
Keywords: Interactive Proofs, Coq, SSReflect, Machine Learning, Clus- 
tering. 



1 Introduction 

Donald Knuth famously compared Computer Programming to Art |15j . This 
comparison still holds for Interactive Theorem Provers (ITPs) [3] . The successful 
and efficient ITP programming relies on previous experience and ability to "cre- 
atively" adapt already used proof techniques and patterns in newly constructed 
proofs. This often requires a combination of mathematical and programming 
intuition; see e.g. 12] . This explains why the "steep learning curve" is often men- 
tioned as one of the big obstacles to wider adoption of ITPs by professional 
mathematicians or industries alike. 

In this paper, we are making the first steps towards automated detection of 
significant proof patterns in Coq/SSReflect 9,. Our main goal is to prove the 
concept: it is possible to embed a lightweight statistical machine-learning tool 
into an ITP proof interface, and use it interactively to find non-trivial patterns 
in existing proofs and to aid new proof development. Related work on using 
machine-learning in ITPs concerned hints in lemma generation for Isabelle/HOL 
[13], proof strategy discovery in Isabelle/HOL [I], speed-up in proof automation 
in HOL-Light [14] and statistical tactic analysis [8] in Isabelle. 

We have described, in a companion paper [16], ML4PG - an extension to 
Proof General that automatically clusters Coq/SSReflect proofs. The ML4PG 
package for Proof General features three main functions: 
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Fl. it works on the background of Proof General, and extracts some simple, 
low-level features from interactive proofs in Coq/SSRcflcct; 

F2. it automatically sends the gathered statistics to a chosen machine- learning 
interface and triggers execution of a clustering algorithm of the user's choice; 

F3. it does some gentle post-processing of the results given by the machine- 
learning tool, and displays families of related proofs to the user. 

Section [5] will give an overview of ML4PG properties, full details of its im- 
plementation are given in [16 . In this paper, we do not focus on ML4PG im- 
plementation per se, although we use it for all the examples and experiments 
shown in this paper. Our main goal here is to show how useful the automated 
proof pattern detection can be. For this purpose, we devise three experiments 
to test ML4PG. Each example is designed to demonstrate a different aspect of 
proof-pattern recognition. ML4PG and all examples presented throughout this 
paper are available in [TTj . 

Section [3] focuses on discovery of proof patterns in mathematical proofs 
across formalisations of apparently disjoint mathematical theories: Linear Alge- 
bra, Combinatorics and Persistent Homology. In this scenario, we use statistically 
discovered proof patterns to advance the proof of a given "problematic" lemma. 
In this case, a few initial steps in its proof are clustered against several mathe- 
matical libraries. In our example, the lemma in question is related to nilpotent 
matrices [5]. This section contains a detailed description of how ML4PG was 
used to discover some non-trivial proof patterns among 750 lemmas across 5 
libraries, and how the detected proof clusters were used to advance the proof for 
this lemma. Notably, ML4PG discovered that the fundamental lemma of Per- 
sistent Homology ;10. , a result from a completely different context, follows the 
proof strategy that would suit for the proof of our lemma. 

Section @] tests ML4PG's functionality in a different area - verification of 
Computer Algebra algorithms as suggested by the CoqEAL methodology [7J. It 
is a different - but equally common - scenario of proof development: CoqEAL 
gives a general methodology to follow, but the exact role of over 1000 proofs and 
definitions in CoqEAL library may be unclear to us. In this case, we equally need 
guidance in lemma formulation, not only in proofs. In this section, we consider 
various proofs concerning a fast algorithm to compute the inverse of triangular 
matrices over a field. Again, ML4PG was able to discover significant clusters, 
and pointed us exactly to the results which could be used as hints to formulate 
the necessary lemmas and complete the proofs. 

Section [5] takes one further step from mathematical to industrial applica- 
tions of Coq and ML4PG. For this purpose, we chose the proofs of correctness of 
the Java Virtual Machine (JVM) given in [12] . Industrial scenario of interactive 
theorem proving may differ significantly from the mathematical scenario above. 
Namely, industrial verification tasks often feature a bigger number of routine 
cases and similar lemmas; and also such tasks are distributed across a team of 
developers. Here, the inefficiency of automated proving often arises when pro- 
grammers use different notation to accomplish very similar tasks, and thus a lot 
of work gets duplicated, see also [6]. We tested ML4PG in exactly such scenario: 



we assumed that a programming team has collectively developed proofs of a) 
soundness of specification, and b) correctness of implementation of Java byte 
code for a dozen of programs computing multiplication, powers, exponentiation, 
and other functions. Next, we show how ML4PG discovered common patterns 
among these proofs and relevant lemmas (around 150 training examples in to- 
tal). The suggested clusters indeed helped to advance the proofs of properties a) 
and b) for the Java byte code of the factorial function. 

These three examples show that statistically discovered proof clusters can 
find patterns in the proofs across different libraries, theories and even users. 
ML4PG works on the background of Proof General, and if called, provides clus- 
tering results almost instantly; thus, can be used interactively, as a handy tool on 
request. ML4PG can speed-up proof development by suggesting re- usable proof 
strategies, providing non-trivial suggestions about analogies between fragments 
of apparently un-related libraries, and detecting common patterns in proofs de- 
veloped by a team. Finally, it may be used for educational purposes, as atomated 
proof-pattern recognition may help to smooth the learning curve. 

2 Automated Proof-Pattern Discovery with ML4PG 

In this section, we give a brief introduction to proof-pattern recognition with 
ML4PG; full details of implementation can be found in [16111) . We will use this 
section to explain the role of features Fl— F3 we mentioned in the introduction. 
Here, we also introduce some technical improvements to |16j : and these are 
dynamic lemma numbering and the proof patch method. 

Example 1. Consider the following example of a lemma about number series: 
Lemma 1 If g ; N — > Z, then 

(g(i + l)-g(i)) = g(n + l)-g(0). 

0<i<n 

and its proof in Table[TJ Note that important proof features could be gathered 
in relation to subgoal shapes (left column), as well as tactics (right column). 

The discovery of statistically significant features in data is a research area of 
its own in machine-learning, known as feature extraction, see [4]. Irrespective of 
the particular feature-extraction algorithm used, most pattern-recognition tools 
will require that the number of selected features is limited and fixed. In ML4PG, 
we design and implement our own method of proof feature extraction. There were 
two major challenges: 

CI. To make the feature extraction method general enough to work with 
interactive proofs of any nature and complexity. 

Example 2. One could consider general goal features such as "goal shape" (e.g. 
"associative-shape" or "commutative-shape"), or properties like "the subgoal 
embeds a hypothesis" , "the subgoal is embedded into a hypothesis" . However, 
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elim : n => [ I n _] . 
by rewrite big_natl. 

rewrite sumrB big_nat_recr big_nat_recl addrC 
addrC -subr_sub -laddrA addrA. 

move : eq_refl; rewrite -subr_eqO; move/eqP => ->. 

by rewrite subOr. 

Qed. 
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Table 1. Proof for Lemma J2 (g(i + 1) - g(i)) = g(n + 1) - g(0) in SSReflect. 



gathering such features uniformly across any set of proofs would be hard, espe- 
cially when working with richer theories and dependent types, where interesting 
cases of shapes and embeddings may not be automatically resolved by first-order 
unification; while some such cases would apply to one proof but not another. See 
Table [T] none of the features mentioned above determines the proof flow. 

C2. To respect the restriction on the fixed size of the feature vectors while 
allowing to data-mine higher-order proofs and formulas of varied length. 

To address challenges CI C2, we designed a method of implicit tracking 
of proof properties, called the proof trace method. First, ML4PG automatically 
tracks simple, low level properties that apply to any possible subgoal, e.g. "the 
top symbol" or "the argument type" . Further, these shallow features arc taken 
in relation to the statistics of user actions on every subgoal: how many and what 
kind of tactics she applied, and what kind of arguments she provided to the 
tactics. Finally, a few proof-steps are taken in relation to each other. Table [2] 
illustrates the process of forming such feature vectors. 

Thus, the proof trace method lets the lemma structure show itself through 
the proof steps it induces. Note that, using the two dimensions of Table [2j we 
gather statistics both dynamically (considering several proof steps instead of just 
one) and relationally (tracking correlation of statistics concerning goal shapes 
and applied tactics within a few proof steps). An advantage of this method is 
that it applies uniformly to any Coq library. ML4PG extracts all proof features 
during Coq compilation. 
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Table 2. A Table illustrating the work of ML^PG's feature extraction algo- 
rithm for the proof in Table [7J Parameters inside the double lines are the extracted 
features used to form the feature vectors. Notation gl-g5 is used to denote five con- 
secutive subgoals in the derivation. Columns are the properties of subgoals the feature 
extraction method tracks: the names of applied tactics, their number, types of argu- 
ments, link of the proof step to a hypothesis (Hyp), inductive hypothesis (IH) or a 
library lemma (EL); top symbol of the current goal, and the number of the generated 
subgoals. 



Clustered proofs may have a varied length, however, challenge C2. restricts us 
to a fixed number of features. As Table [2] shows, the default feature extraction 
algorithm implemented in ML4PG takes five proof steps to form one feature 
vector. If the proof is larger, it will form separate feature vectors for proof steps 
6-10, 11-15, and so on, using the same feature extraction algorithm as applied 
to form Table [2j Additionally, ML4PG always extracts features from the last 
five proof steps the user makes. We call this method the proof patch method, as 
it allows one to cluster longer proofs by statistically analysing their fragments. 
Potentially, one small proof may resemble a fragment of a bigger proof, see 
Example |3] 

To prepare the output of the feature extraction algorithm of Table [2] for 
statistical data-mining, ML4PG converts all the features into numbers. E.g., 
every tactic, type of a tactic argument, and a top symbol of a subgoal is assigned 
some random number; and then this number is used consistently across all proofs. 
The most significant algorithm in this numerical conversion concerns lemma 
numbering, see the forth column of Table HJ It has an effect on clustering results, 
especially if ML4PG works with big proof libraries, and the lemma numbering 
gives a big value spread. To avoid inaccuracies of the blind lemma numbering, 
ML4PG implements dynamic lemma numbering: during the Coq compilation 
and feature extraction, ML4PG starts with numbering first two lemmas Coq 
compiles, and then continues inductively to cluster more lemmas one by one, 
and re-number them according to the cluster proximity of one proof to another. 
As a result, similar lemmas are assigned close values. 

Once all proof features are extracted, ML4PG is ready to communicate with 
machine-learning interfaces. Every machine- learning engine has its concrete for- 
mat to represent feature vectors. ML4PG is built to be modular - that is, the 
feature extraction is first completed within the emacs environment, where the 
data is gathered in the format of hash tables, and then these tables are converted 
to the format of the chosen machine-learning tool (in our case, Matlab or Weka). 
ML4PG transforms the feature vectors to a comma separated values (csv) file in 



the case of Matlab; or to arff files in the case of Weka. In principle, extending 
the list of machine-learning engines does not require any further modifications 
to the feature extraction algorithm, but just defining new translators. 

Next, ML4PG invokes the machine-learning engine. The ML4PG mechanism 
connecting to machine-learning interfaces is similar to the native mechanism of 
Proof General used to connect to ITPs. Namely, there is a synchronous com- 
munication between ML4PG and the machine-learning interfaces, which runs in 
the background waiting for ML4PG calls. The user can chose additional proof li- 
braries to be clustered against the current proof: these libraries must be exported 
with the mechanism provided by ML4PG. 

The next configuration option ML4PG offers is the choice of the particular 
pattern-recognition algorithm. We connected ML4PG only to clustering algo- 
rithms [4] - a family of unsupervised learning methods. Unsupervised learning 
is chosen when no user guidance or class tags are given to the algorithm in ad- 
vance. There are several clustering algorithms available in Matlab (K-means and 
Gaussian) and Weka {K-means, FarthestFirst and Expectation Maximisation)] 
we will use several of them in the coming sections. Clustering techniques divide 
data into n groups of similar objects (called clusters), where the value of n is 
a "learning" parameter provided by the user. Increasing the value of n means 
that the algorithm will try to separate objects into more classes, and, as a con- 
sequence, each cluster will contain fewer examples but with higher correlation. 

Various numbers of clusters can be useful for proof mining: this may depend 
on the size of the data set, and on existing similarities between the proofs. 
ML4PG accommodates such choices. In general, small values of n are useful when 
searching for general proof patterns which can later be refined by increasing the 
value of n. However, extreme values are to be avoided: small values of n can 
produce meaningless over-sized clusters; whereas trivial clusters with just one 
proof may be found for big values of n. 

In the machine- learning literature, there exists a number of heuristics to de- 
termine this optimal number of clusters, [5D]. We used them as an inspiration to 
formulate our own algorithm for ML4PG, tailored to the interactive proofs. It 
takes into consideration the size of the proof library and an auxiliary parameter 
we introduce here - called granularity. This parameter is used to calculate the 
optimal number of proof clusters, using the formulas of Table [3] As a result, the 
user does not provide the value of n, but just decides on granularity in ML4PG 
menu, by selecting a value between 1 and 5, where 1 stands for a low granu- 
larity (producing big and general clusters) and 5 stands for a high granularity 
(producing small and precise clusters). See Example |4] 

Results of one run of a clustering algorithm may differ from another, even on 
the same data set. This is due to the fact that clustering algorithms randomly 
choose examples to start from and form clusters relative to those examples. 
However, some clusters are found repeatedly - and frequently - in different runs. 
To gather sufficient statistical data from proofs, ML4PG automatically runs the 
chosen clustering algorithm 200 times at every call of clustering, and collects the 
frequencies of each cluster. To judge reliability of clusters, the ML4PG user can 
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Table 3. ML4PG formulas computing clustering parameters. Left: the for- 
mula computing the number of clusters given the granularity value, where I is the 
number of lemmas in the library. Right: the formula computing frequency thresholds 
given a frequency parameter. 



choose one of the three frequency thresholds shown in Table |31 If the frequency 
of a cluster falls below the pre-set threshold, the corresponding proofs will not 
be displayed to the user. High frequencies suggest a high correlation among the 
proofs of a cluster; but lower frequencies can sometimes be useful in search of 
less obvious statistical correlation between proofs, see Example [5] 

In addition to the frequency threshold, there is another parameter which 
ensures the reliability of the clusters. Clustering algorithm output contains not 
only clusters but also their proximity values. This measure ranges from +1, indi- 
cating points that are very distant from other clusters, through to 0, indicating 
points that are not distinctly in one cluster or another, and to — 1, indicating 
points that are probably assigned to the wrong cluster. We have fixed 0.5 as an 
accuracy threshold, and all the clusters whose measure is under such value are 
ignored by ML4PG. 

Finally, after discarding clusters with low proximity and those which fall 
below the pre-set threshold, the remaining clusters with lemma names and fre- 
quencies are displayed on Proof General panel. 

3 Proof Patterns in Mathematical Proofs 

Development of interactive provers has led to the creation of big data sets 
of libraries and development of varied infrastructures for formal mathemati- 
cal proofs. However, these frameworks usually involve thousands of definitions 
and theorems (for instance, there are approximately 4200 definitions and 15000 
theorems in the case of the formalisation of the Odd Order theorem [18]). It 
is difficult to trace them to find patterns which could be reused in the proof 
of a new theorem. Therefore, a tool which could detect proof strategies arising 
in mathematical proofs across several libraries will make the proof development 
task easier. In order to illustrate this fact, let us consider the following two 
lemmas together with Lemma [TJ 

Lemma 2 Let M be a square matrix and n be a natural number such that 

n-1 

M n = 0, then (1 - M) x £ M i = 1. 

i=0 



Lemma 3 Let /?f fe : N x N x N -> Z, then 
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These three lemmas come from different contexts. Lemma [2] states a result 
about nilpotent matrices (a square matrix M is nilpotent if there exists an n 
such that M n = 0). Lemma |3] is a generalisation of the fundamental lemma of 
Persistent Homology, the actual lemma and its formalisation can be seen in [TU] . 
Finally, Lemma Q] is a basic fact about summations. 

When proving Lemma [2] it is difficult, even for the expert user, to get the 
intuition that she can reuse the proofs of Lemmas [3] and [U There are several 
reasons for this. First of all, the formal proofs of these lemmas are in different 
libraries (the proof of Lemma [5] is in a library about matrices, the proof of 
Lemma [3] is in a library about Persistent Homology, and the proof of Lemma [1] 
in a library about basic results in summations); also, it is hard to establish 
a conceptual connection among them. Moreover, although the three lemmas 
involve summations, the type of the terms of those summations are different. 
Therefore, search based on types or keywords would not provide any valuable 
information. Even search of all the lemmas involving summations is not useful, 
since there are more than 250 lemmas - a considerable amount for handling 
them manually. 

However, if Lemmas[3]and[T]are suggested when proving LemmaJS] the expert 
would be able to spot the similarities among them, and notice the following proof 
pattern. 

Proof Strategy 1 

1. Apply induction on n. 

(a) Prove the base case (a simple task). 

(b) Prove the inductive case: 

i. expand the summation, 

ii. cancel the terms pairwise, 

Hi. the only terms remaining after the cancellation are the first and the 



For instance, using the above proof strategy in the case of Lemma [H the 
proof of the inductive case is as follows. 
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Proof. 
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= M° - M 1 + M x -M 2 + ... + M 
= M° - M n 
= 1-0 = 1. 
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□ 



It is worth noting that we cannot blindly apply Proof Strategy UJ or the 
tactics applied in the proof of Lemmas [3J and [1] to the proof of Lemma [5] since 
there are small nuances. Nevertheless, the general idea of the proof pattern can 
be followed to finish the proof and, with some minor modification, most of the 
tactics can be reused. 

ML4PG will suggest Lemmas [3] and [1] when proving Lemma [2j if we use the 
following settings for statistical pattern recognition. First of all, we consider 5 
SSReflect libraries for clustering: bigop (devoted to generic indexed big oper- 

n 

ations like ^ f(i) or f] f(i)), matrix, binomial (which defines combinatorial 

i=o iei 

notions and include several results involving summations) , a small library about 
summations (which includes the proof of Lemma [JJ and the library about Per- 
sistent Homology formalised in QJJ]. These libraries involve approximately 750 
lemmas, and so the ML4PG will analyse a data set of 750 examples; here, we 
connect it to the K-means clustering algorithm in Weka. 

Example 3 (Proof patches). Let us note that Proof Strategy [JJ can be applied 
twice in the proof of Lemma [31 firstly in the inner summation and subsequently 
in the remaining proof. Then, 2 of the 15 suggestions provided by ML4PG in 
this case come from the proof of Lemma |3j 

The configuration of the granularity parameter can be approached in two 
different ways: 

— top-down: this approach suggests first using a small value for the granularity 
to obtain a general proof pattern, and then refine that pattern increasing 
the granularity value. 

— bottom-up: in this approach, a high value for the granularity is used to see 
what are the most similar lemmas and, then, decrease the granularity value 
to see more general - and potentially less trivial - patterns. 

Example 4 (Proof Granularity) . In our case, we use the top-down approach start- 
ing with the default granularity value of 3. Using this value ML4PG obtains 15 
suggestions which are related to lemmas about summations including Lemmas [3J 
andUJ Increasing the granularity level to 4, ML4PG discovers Lemmas [3J and [JJ 
and also the following lemma. 

Lemma 4 Let M be a nilpotent matrix, then there exists a matrix N such that 
N X (1 -M) = 1. 

At first sight, the proof of this lemma does not seem to fit Proof Strategy UJ 
since the statement of the lemma does not involve summations. However, in- 

n-1 

specting its proof, we can see that it uses ^2 M l as witness for TV and then 

»=o 

follows Proof Strategy UJ In fact, if we use the highest granularity level, this is 
the only suggestion given by ML4PG to prove Lemma [2] since, apart from the 
step to provide the witness, the proofs of both lemmas are practically the same. 



Therefore, ML4PG can capture non-trivial mathematical proof patterns aris- 
ing across different data types, a variety of lemma shapes and libraries. More- 
over, it allows to change machine-learning settings to obtain clusters of varied 
precision. 

4 Proof Patterns for Importing Proof Methods 

There is a trend in ITPs to develop general purpose methodologies to aid in the 
formalisation of a family of related proofs. However, although the application 
of a methodology is straightforward for its developers, it is usually difficult for 
an external user to decipher the key results to import such a methodology into 
a new development. Therefore, tools which can capture methods and suggest 
appropriate lemmas based on proof patterns would be valuable. Here, we show 
how ML4PG can be useful in this context with an example coming from the 
formal proof of the correctness of a Computer Algebra algorithm. 

Most algorithms in modern Computer Algebra systems are designed to be 
efficient, and this usually means that their verification is not an easy task. In 
order to overcome this problem, a methodology based on the idea of refinements 
was presented in [7] , and was implemented as a new library, built on top of the 
SSReflect libraries, called CoqEAL. Roughly speaking, the approach to formalise 
efficient algorithms followed in 2] can be split into three steps: 

51. define the algorithm relying on rich dependent types, this will make the proof 
of its correctness easier; 

52. refine such a definition to an efficient algorithm described on high-level data 
structures; and, 

53. implement it on data structures which are closer to machine representations. 

The CoqEAL methodology is clear and the authors have shown that it can be 
extrapolated to different problems. Nevertheless, this library contains approxi- 
mately 400 definitions and 700 lemmas; then, the search of proof strategies inside 
this library is not a simple task which could be undertaken manually. 

In order to illustrate this, let us consider the formalisation of a fast algorithm 
to compute the inverse of triangular matrices over a field with Is in the diagonal 
using the CoqEAL methodology. Our interest in the inverse of this kind of ma- 
trices comes from its application in the context of Discrete Morse Theory [TO] , 
where they are used to speed-up the computation of homology groups. SSReflect 
already implements the matrix inverse relying on rich dependent types using the 
invmx function; then, we only need to focus on the second and third steps of the 
CoqEAL methodology. 

Using an algorithm specially designed to efficiently compute the inverse of tri- 
angular matrices with Is in the diagonal, we define a function called f ast_invmx 
using high-level data structures. 

Algorithm 1 Let M be a square triangular matrix of size n with Is in the 
diagonal; then f ast_ invmx (M) is recursively defined as follows. 



— If n = 0, then f ast_invmx(M)=l°/ M (where l°/oM is the notation for the identity 
matrix in SSReflcct). 

— Otherwise, we can decompose M in a matrix with four components: the 
top-left element, which is 1; the top-right line vector, which is null; the 
bottom- left column vector C; and the bottom-right (n — 1) x (n — 1) matrix 



where *m is the notation for matrix multiplication in SSReflcct. 

Subsequently, we should prove the equivalence between the functions invmx 
and fast_invmx. Proving this result is not trivial due to the different nature 
of the algorithms: the former is a general algorithm to compute the inverse of 
matrices - using adjugate matrices and determinants; on the contrary, the latter 
is an ad-hoc efficient algorithm for a special case of triangular matrices, which 
takes advantage of the shape of those matrices to obtain their inverse. 

In the CoqEAL library, there are just three lemmas devoted to prove the 
equivalence between a matrix algorithm and its efficient version. Namely, those 
lemmas are related to the multiplication, the rank and the determinant of ma- 
trices. However, the strategies followed to prove those equivalence lemmas are 
ad-hoc for the concrete algorithms, and the only common step which could be 
reused from them in our concrete case is the application of induction on the 
size of the matrix. Therefore, in this situation, it makes sense to ask ML4PG 
for some hint that could help us to tackle the proof before trying to prove it by 
brute force. 

We configure ML4PG as follows. First of all, we consider both the matrix 
library of SSReflect and the CoqEAL library for clustering - they involve approx- 
imately 1000 lemmas. This time, we connect ML4PG to the Gaussian clustering 
algorithm in Matlab. Moreover, we use a bottom-up approach to configure the 
granularity parameter. Finally, in order to configure the frequencies parameter, 
our experience shows that the analysis of frequencies may give two opposite 
effects. 

— On the one hand, high frequencies suggest that the proofs found in clusters 
have a high correlation, and that is a desirable property. 

— On the other hand, proofs with too high correlation may be too trivial for 
providing interesting proof hints. Therefore, it is sometimes useful to look 
for proof clusters with lower frequencies - as they may potentially contain 
those non-trivial analogies. 

Example 5 (Proof frequencies) . For our CoqEAL experiments, we start using 5 as 
granularity parameter and 3 as frequencies parameter (see Table [3]) ; with these 
settings, ML4PG will provide similar lemmas with a high correlation among 
them. However, ML4PG does not find any proof cluster for our proof. There- 
fore, we decrease both granularity and frequencies parameters to 3 and 2 in 





order to obtain more general clusters and with lower correlation. Using these 
settings, ML4PG suggests 10 lemmas. Three of them are the ones about efficient 
multiplication, rank and determinant; but, as we have previously said, they do 
not provide any hint to finish our proof. Among the most interesting suggestions 
was the following unicity lemma. 

Lemma 5 Let Mi and M 2 be two square matrices such that Mi x M 2 = 1 
(where 1 is the identity matrix); then, M 2 is the inverse of M\. 

Then, to prove the equivalence between invmx and f ast_invmx, it is enough 
to prove that given a triangular matrix M, then M *m f ast_invmx(M)= 1°/ M. This 
result is easy to prove. 

Proof. Apply induction on the size of the matrix. 

— The base case is trivial. 

— In the inductive case, M *m f ast_ invmx (M) is equal to: 



Therefore, ML4PG does not provide here any proof pattern and the correla- 
tion of our current proof with the suggested lemmas is not too high, but it has 
helped us to formulate an auxiliary lemma to finish our original proof. 

Once we have proven the equivalence between the two matrix inverse algo- 
rithms, we can focus on the third step of the CoqEAL methodology. It is worth 
mentioning that neither invmx nor f ast_invmx can be used to actually compute 
the inverse of matrices. These functions cannot be executed since the definition of 
matrices is locked in SSRcflect to avoid the trigger of heavy computations during 
deduction steps. Using step S3, of the CoqEAL methodology we can overcome 
this pitfall. In our case, we implement the function cf ast_invmx using lists of 
lists as the low level data type for representing matrices. 

CoqEAL provides the executable counterpart of most of the matrix opera- 
tions included in the matrix library of SSRcflect. Moreover, the correctness of 
those functions is proved through translation lemmas, which state the equiva- 
lence between the executable version and the abstract version. Then, the imple- 
mentation of cf ast_invmx is almost a direct translation of f ast_invmx using 
the executable counterparts of matrix operations provided by CoqEAL; and, the 
proof of correctness can be achieved applying the translations lemmas of the op- 
erations involved in the definition of the algorithm (see [7]). If ML4PG is called, 
it finds that all the translation lemmas form a unique cluster. 





Applying the inductive hypothesis, the result is proven. 



□ 



Therefore, ML4PG can help to import proof methods into new developments 
in two different ways. First of all, ML4PG can detect typical proof patterns 
which are followed in a methodology (the proof pattern of translation lemmas). 
Moreover, even in the cases where there is no common proof strategy, ML4PG 
can suggest lemmas that help in the formulation of auxiliary results, making the 
proof development easier. 



5 Proof Patterns in Industrial Proofs 

ITPs have been successfully used in industry to verify the correctness of hardware 
and software systems. In this context, proofs usually have a certain regularity 
involving several routing cases and similar lemmas. However, the proofs are often 
developed by a team, where users have their own list of definitions and lemmas 
in different notations. Thus, in team-based developments, it would be extremely 
helpful to use a tool that could detect proof patterns across different users, 
notations and libraries. 

We examine the suitability of ML4PG for this task with the formalisation of 
a simple model of the Java Virtual Machine in Coq/SSReflect. The Java Virtual 
Machine (JVM) [17] is a stack-based abstract machine which can execute Java 
byte code. We have modelled an interpreter for JVM programs in COQ. From 
now on, we refer to our machine as "CJVM" (for Coq JVM). 

Given a specific Java method, we can translate it to Java byte code using a 
tool such as javac of Sun Microsystems. Such a byte code can be executed in 
CJVM provided a schedule, and the result will be the state of the JVM at the 
end of the schedule. Moreover, we can prove theorems about the CJVM model 
behaviour when interpreting that byte code. The byte code associated with the 
factorial program can be seen in Figure [T] 



static int factorial (int n) 
{ 

int a = 1 ; 
while (n != OH 

a = a * n; 

n = n-lj 

} 

return a; 

} 
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Fixpoint helper_fact (n a) := 
match n with 
I => a 

I S p => helper_fact p (n * a) 
end. 

Definition fn_fact (n : nat) := 
helper_f act n 1 . 



Fig. 1. Factorial function. Left: Java program for computing the factorial of nat- 
ural numbers. Centre: Java byte code associated with the Java program. Right: tail 
recursive version of the factorial function in Coq. 



The state of the CJVM consists of 4 fields: a program counter (a natural 
number), a set of registers called locals (implemented as a list of natural num- 



bers), an operand stack (a list of natural numbers), and the byte code program 
of the method being evaluated. 

Java byte code, like the one presented in Figure [TJ can be executed within 
CJVM. However, more interesting than mere executing Java byte code, we can 
prove the correctness of the implementation of the Java byte code programs 
using Coq. For instance, in the case of the factorial program, we can prove the 
following theorem, which states the correctness of the factorial byte code. 

Lemma 6 Vn £ N, CJVM produces a state which contains n\ on top of the 
stack running the byte code associated with the factorial program with n as 
input. 

The proof of theorems like the one above always follows the same method- 
ology exported from ACL2 proofs about Java Virtual Machines [12] and which 
consists of the following three steps. 

(1) Write the specification of the function, write the algorithm, and prove that 
the algorithm satisfies the specification. 

(2) Write the JVM program within Coq, define the function that schedules the 
program (this function will make CJVM run the program to completion as 
a function of the input to the program) , and prove that the resulting code 
implements this algorithm. 

(3) Prove total correctness of the Java byte code. 

Using this methodology, we have proven the correctness of a dozen of pro- 
grams related to arithmetic (multiplication of natural numbers, exponentiation 
of natural numbers and so on). The proof of each theorem was done indepen- 
dently from others to model a distributed proof development. 

Therefore, we simulated the following scenario. Suppose a new developer 
tackles for the first time the proof of Lemma [6l and she knows the general 
methodology to prove it and has access to the library of programs previously 
proven by other users. This situation is similar to the one presented in Section|4] 
with an additional problem: the different notation employed by different users 
obscure some common features. ML4PG would be a good alternative to the 
manual search for proof patterns. 

The settings of ML4PG allowing to find interesting proof patterns are the 
same for all three steps listed above. We consider all the libraries related to the 
proofs of Java byte code programs (they involve approximately 150 lemmas). We 
connect ML4PG to the k-means library in Matlab. Moreover, we use the default 
value of 3 for the granularity parameter and the frequency value of 3. 

Let us focus on the first step of the methodology - that is, the proof of the 
equivalence between the specification of the factorial function (which is already 
defined in SSRcflect) and the algorithm. The Java factorial function is an itera- 
tive function; then, the algorithm is written in Coq as a tail recursive function, 
see the right side of Figure [TJ In general, all the tail recursive functions are de- 
fined using an auxiliary function, called the helper, and a wrapper for such a 
function. The suggestions provided by ML4PG in this case are the proofs of step 



(1) for three iterative programs: the multiplication, the exponentiation and the 
power of natural numbers. All of them follow the same proof strategy which can 
be also applied in the case of factorial: 

Proof Strategy 2 Prove an auxiliary lemma about the helper considering the 
most general case. For example, if the helper function is defined with formal 
parameters n, m, and a and the wrapper calls the helper initializing a to 0, the 
helper theorem must be about (helper n m a), not just about the special case 
(helper n m 0). Subsequently, instantiate the lemma for the concrete case. 

In order to prove that the Java byte code implements the factorial algorithm, 
ML4PG suggests 4 lemmas which are used to that aim in other cases. All of those 
programs are iterative and involve a loop. The strategy which is followed in those 
proofs is the following one. 

Proof Strategy 3 Prove that the loop implements the helper using an auxiliary 
lemma. Such a lemma about the loop must consider the general case as in the 
case of Proof Strategy^ Subsequently, instantiate the result to the concrete case. 

Finally, ML4PG finds that all the lemmas involved in the proof of the total 
correctness of the programs for different functions arc similar and follow the same 
proof pattern which consists in applying the lemmas obtained from steps (1) and 

(2) . Following these guidelines, Lemma [5] can be formalised in Coq by analogy 
with other similar proofs, obtaining as a result the proof of the correctness of 
the factorial Java byte code, see [IT] for the full proof. 

6 Conclusions and Further work 

In this paper, we have presented three examples, of very different nature, to test 
the capabilities of statistical proof-pattern recognition. Various technical details 
of ML4PG implementation may be subject to change in the future: e.g. the fea- 
ture extraction mechanism or the clustering algorithms may be further tuned. 
Our experiments convince us that a tool like ML4PG can be a practical addition 
to interactive proof development; its fully functional version can be downloaded 
from [TT]. Among the methods that are crucial for ML4PG success are the proof 
trace and the proof-patch methods, as well as the model of interactive and mod- 
ular interfacing between Proof General and machine- learning engines. 

The proof trace method is general enough to be applied to other ITPs, such 
as Isabelle/HOL or HOL4, without any special hindrance. In addition, Proof 
General provides a common interface for several ITPs. Therefore, it is appeal- 
ing to use machine- learning techniques to automatically find and import proof 
patterns across various ITPs in the future. 

A more technical research line of future work is the development of ML4PG 
as a distributed tool. As we have shown in Section [SJ ML4PG can be helpful 
for team-based developments. However, current implementation of ML4PG is 
centralised; this means that the user can obtain proof clusters of the libraries 



available on her computer. Then, we think that a client-server architecture, where 
the proof information is shared among several users could also be useful. 
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